An Information-Theoretic Sentence Similarity Metric
نویسندگان
چکیده
We describe an information theoretic-based metric for sentence similarity. The method uses the information content (IC) of dependency triples using corpus statistics generated by processing the Open American National Corpus (OANC) with the Stanford Parser. We define the similarity of two sentences as a function of (1) the similarity of their constituent dependency triples, and (2) the position of the triples in their respective dependency trees. We compare results of the algorithm to human judgments of similarity of 1725 sentence pairs.
منابع مشابه
Composite Kernel Optimization in Semi-Supervised Metric
Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...
متن کاملMAXSIM: An Automatic Metric for Machine Translation Evaluation Based on Maximum Similarity
This paper describes our participation in the NIST 2008 MetricsMATR Challenge, using our recently proposed automatic machine translation evaluation metric MAXSIM. The metric calculates a similarity score between a pair of English system-reference sentences by comparing information items such as ngrams across the sentence pair. Unlike most metrics, MAXSIM computes a similarity score between item...
متن کاملAn Effective Approach for Robust Metric Learning in the Presence of Label Noise
Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...
متن کاملBIOSSES: a semantic sentence similarity estimation system for the biomedical domain
Motivation The amount of information available in textual format is rapidly increasing in the biomedical domain. Therefore, natural language processing (NLP) applications are becoming increasingly important to facilitate the retrieval and analysis of these data. Computing the semantic similarity between sentences is an important component in many NLP tasks including text retrieval and summariza...
متن کاملMAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation
We propose an automatic machine translation (MT) evaluation metric that calculates a similarity score (based on precision and recall) of a pair of sentences. Unlike most metrics, we compute a similarity score between items across the two sentences. We then find a maximum weight matching between the items such that each item in one sentence is mapped to at most one item in the other sentence. Th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015